NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Structure-Aware Framework for Learning Device Placements on Computation Graphs

Duan, Shukai; Ping, Heng; Kanakaris, Nikos; Xiao, Xiongye; Kyriakis, Panagiotis; Ahmed, Nesreen K; Zhang, Peiyu; Ma, Guixiang; Capotă, Mihai; Nazarian, Shahin; et al (December 2024, NeurIPS)

Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks. The device placement problem aims to identify optimal allocations of those nodes to a set of (potentially heterogeneous) devices. Existing approaches rely on two types of architectures known as grouper-placer and encoder-placer, respectively. In this work, we bridge the gap between encoder-placer and grouper-placer techniques and propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into account the DAG nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and jointed, personalized graph partitioning, using an unspecified number of groups. To train the entire framework, we use reinforcement learning using the execution time of the placement as a reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to over CPU execution and by up to compared to other commonly used baselines.
more » « less
Full Text Available
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks

https://doi.org/10.1109/ICASSP48485.2024.10447377

Cheng, Anzhe; Ping, Heng; Wang, Zhenkun; Xiao, Xiongye; Yin, Chenzhong; Nazarian, Shahin; Cheng, Mingxi; Bogdan, Paul (April 2024, IEEE)
Ko, Hanseok (Ed.)
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF.
more » « less
Full Text Available
End-to-end programmable computing systems

https://doi.org/10.1038/s44172-023-00127-7

Xiao, Yao; Ma, Guixiang; Ahmed, Nesreen_K; Capotă, Mihai; Willke, Theodore_L; Nazarian, Shahin; Bogdan, Paul (November 2023, Communications Engineering)

Abstract Recent technological advances have contributed to the rapid increase in algorithmic complexity of applications, ranging from signal processing to autonomous systems. To control this complexity and endow heterogeneous computing systems with autonomous programming and optimization capabilities, we propose aunified, end-to-end, programmable graph representation learning(PGL) framework that mines the complexity of high-level programs down to low-level virtual machine intermediate representation, extracts specific computational patterns, and predicts which code segments run best on a core in heterogeneous hardware. PGL extracts multifractal features from code graphs and exploits graph representation learning strategies for automatic parallelization and correct assignment to heterogeneous processors. The comprehensive evaluation of PGL on existing and emerging complex software demonstrates a 6.42x and 2.02x speedup compared to thread-based execution and state-of-the-art techniques, respectively. Our PGL framework leads to higher processing efficiency, which is crucial for future AI and high-performance computing applications such as autonomous vehicles and machine vision.
more » « less
A stochastic quantum program synthesis framework based on Bayesian optimization

https://doi.org/10.1038/s41598-021-91035-3

Xiao, Yao; Nazarian, Shahin; Bogdan, Paul (December 2021, Scientific Reports)
null (Ed.)
Abstract Quantum computers and algorithms can offer exponential performance improvement over some NP-complete programs which cannot be run efficiently through a Von Neumann computing approach. In this paper, we present BayeSyn, which utilizes an enhanced stochastic program synthesis and Bayesian optimization to automatically generate quantum programs from high-level languages subject to certain constraints. We find that stochastic synthesis can comparatively and efficiently generate a program with a lower cost from the high dimensional program space. We also realize that hyperparameters used in stochastic synthesis play a significant role in determining the optimal program. Therefore, BayeSyn utilizes Bayesian optimization to fine-tune such parameters to generate a suitable quantum program.
more » « less
Full Text Available
From rumor to genetic mutation detection with explanations: a GAN approach

https://doi.org/10.1038/s41598-021-84993-1

Cheng, Mingxi; Li, Yizhi; Nazarian, Shahin; Bogdan, Paul (December 2021, Scientific Reports)
null (Ed.)
Abstract Social media have emerged as increasingly popular means and environments for information gathering and propagation. This vigorous growth of social media contributed not only to a pandemic (fast-spreading and far-reaching) of rumors and misinformation, but also to an urgent need for text-based rumor detection strategies. To speed up the detection of misinformation, traditional rumor detection methods based on hand-crafted feature selection need to be replaced by automatic artificial intelligence (AI) approaches. AI decision making systems require to provide explanations in order to assure users of their trustworthiness. Inspired by the thriving development of generative adversarial networks (GANs) on text applications, we propose a GAN-based layered model for rumor detection with explanations. To demonstrate the universality of the proposed approach, we demonstrate its benefits on a gene classification with mutation detection case study. Similarly to the rumor detection, the gene classification can also be formulated as a text-based classification problem. Unlike fake news detection that needs a previously collected verified news database, our model provides explanations in rumor detection based on tweet-level texts only without referring to a verified news database. The layered structure of both generative and discriminative models contributes to the outstanding performance. The layered generators produce rumors by intelligently inserting controversial information in non-rumors, and force the layered discriminators to detect detailed glitches and deduce exactly which parts in the sentence are problematic. On average, in the rumor detection task, our proposed model outperforms state-of-the-art baselines on PHEME dataset by $$26.85\%$$ 26.85 % in terms of macro-f1. The excellent performance of our model for textural sequences is also demonstrated by the gene mutation case study on which it achieves $$72.69\%$$ 72.69 % macro-f1 score.
more » « less
Full Text Available
An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study

https://doi.org/10.1038/s41598-021-81749-9

Yang, Zikun; Bogdan, Paul; Nazarian, Shahin (February 2021, Scientific Reports)

Abstract The rampant spread of COVID-19, an infectious disease caused by SARS-CoV-2, all over the world has led to over millions of deaths, and devastated the social, financial and political entities around the world. Without an existing effective medical therapy, vaccines are urgently needed to avoid the spread of this disease. In this study, we propose an in silico deep learning approach for prediction and design of a multi-epitope vaccine (DeepVacPred). By combining the in silico immunoinformatics and deep neural network strategies, the DeepVacPred computational framework directly predicts 26 potential vaccine subunits from the available SARS-CoV-2 spike protein sequence. We further use in silico methods to investigate the linear B-cell epitopes, Cytotoxic T Lymphocytes (CTL) epitopes, Helper T Lymphocytes (HTL) epitopes in the 26 subunit candidates and identify the best 11 of them to construct a multi-epitope vaccine for SARS-CoV-2 virus. The human population coverage, antigenicity, allergenicity, toxicity, physicochemical properties and secondary structure of the designed vaccine are evaluated via state-of-the-art bioinformatic approaches, showing good quality of the designed vaccine. The 3D structure of the designed vaccine is predicted, refined and validated by in silico tools. Finally, we optimize and insert the codon sequence into a plasmid to ensure the cloning and expression efficiency. In conclusion, this proposed artificial intelligence (AI) based vaccine discovery framework accelerates the vaccine design process and constructs a 694aa multi-epitope vaccine containing 16 B-cell epitopes, 82 CTL epitopes and 89 HTL epitopes, which is promising to fight the SARS-CoV-2 viral infection and can be further evaluated in clinical studies. Moreover, we trace the RNA mutations of the SARS-CoV-2 and ensure that the designed vaccine can tackle the recent RNA mutations of the virus.
more » « less
There Is Hope After All: Quantifying Opinion and Trustworthiness in Neural Networks

https://doi.org/10.3389/frai.2020.00054

Cheng, Mingxi; Nazarian, Shahin; Bogdan, Paul (July 2020, Frontiers in Artificial Intelligence)
null (Ed.)
Full Text Available
CSrram: Area-Efficient Low-Power Ex-Situ Training Framework for Memristive Neuromorphic Circuits Based on Clustered Sparsity

https://doi.org/10.1109/ISVLSI.2019.00090

Fayyazi, Arash; Kundu, Souvik; Nazarian, Shahin; Beerel, Peter A.; Pedram, Massoud (July 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI))

Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.
more » « less
Full Text Available
Taming Extreme Heterogeneity via Machine Learning based Design of Autonomous Manycore Systems

Bogdan, Paul; Chen, Fan; Deshwal, Aryan; Doppa, Janardhan Rao; Joardar, Biresh Kumar; Li, Hai; Nazarian, Shahin; Song, Linghao; Xiao, Yao (October 2019, Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, CODES+ISSS 2019, part of ESWEEK 2019)

Full Text Available
Prediction-based fast thermoelectric generator reconfiguration for energy harvesting from vehicle radiators

https://doi.org/10.23919/DATE.2018.8342130

Yang, Hanchen; Kang, Feiyang; Ding, Caiwen; Li, Ji; Kim, Jaemin; Baek, Donkyu; Nazarian, Shahin; Lin, Xue; Bogdan, Paul; Chang, Naehyuck (March 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Thermoelectric generation (TEG) has increasingly drawn attention for being environmentally friendly. A few researches have focused on improving TEG efficiency at system level on vehicle radiators. The most recent reconfiguration algorithm shows improvement on performance but suffers from major drawback on computational time and energy overhead, and non-scalability in terms of array size and processing frequency. In this paper, we propose a novel TEG array reconfiguration algorithm that determines near-optimal configuration with an acceptable computational time. More precisely, with O(N) time complexity, our prediction-based fast TEG reconfiguration algorithm enables all modules to work at or near their maximum power points (MPP). Additionally, we incorporate prediction methods to further reduce the runtime and switching overhead during the reconfiguration process. Experimental results present 30% performance improvement, almost 100 χ reduction on switching overhead and 13 χ enhancement on computational speed compared to the baseline and prior work. The scalability of our algorithm makes it applicable to larger scale systems such as industrial boilers and heat exchangers.
more » « less
Full Text Available

Search for: All records